Identifying homes affected by the Texas energy crisis by census tract
MEDS
data science
R
spatial-analysis
Author
Affiliation
Colleen McCamy
MEDS
Published
November 15, 2022
introduction
In February 2021, the state of Texas was in a state of crisis when unusual winter storms left millions without power. Power outages pose an extra threat to customers who those who are medically vulnerable and marginalized groups and historically divested groups may be disproportionately impacted. 1
This blog post highlights an initial investigation estimating the number of homes in the Houston metropolitan area that lost power as a result of the first two storms and exploring if socioeconomic factors are predictors of outage areas.
The goal of the investigation was to practice using spatial data and associated functions and packages. Further investigation is needed to answer the questions explored in this practice and this investigation’s limitations are outlined in the conclusion section.
methods
This section illustrates the code executed to explore our questions.
Heading to the library
Get your library card ready because it is time to load these packages.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(sf)
Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(stars)
Loading required package: abind
library(tmap)library(raster)
Loading required package: sp
Attaching package: 'raster'
The following object is masked from 'package:dplyr':
select
library(terra)
terra 1.6.17
library(tmap)library(ggplot2)
Step 1: Diving into the Data
What would be a data science project without the data? The following code block uses SQL and the stars package to load in the data needed for the project.
For our raster data we used NASA’s Worldview data for February 7, 2021 (before the power outage) and Februray 16, 2021 (during the power outage) to visualize the extent of the power outage in the Houston area. These days were selected as other days during the time frame had too much cloud cover to be useful.
#reading in the NASA raster filesnl_feb07_t1<-read_stars('/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/VNP46A1/VNP46A1.A2021038.h08v05.001.2021039064328.tif')nl_feb07_t2 <-read_stars('/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/VNP46A1/VNP46A1.A2021038.h08v06.001.2021039064329.tif')nl_feb16_t1 <-read_stars('/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/VNP46A1/VNP46A1.A2021047.h08v05.001.2021048091106.tif')nl_feb16_t2 <-read_stars('/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/VNP46A1/VNP46A1.A2021047.h08v06.001.2021048091105.tif')
Next up we have data from roads. To minimize accounting for light from major highways systems, we used publicly available geographic data from OpenStreetMap (OSM) through Geofabrik’s download sites. This data were downloaded and prepped in advance to contain a subset of highway and raods that intersect with the Houston metropolitan area. We also used data from Geofabrik’s download sites for information on houses in the Houston metropolitan area.
## ---- Highway Data# reading in highway data using SQL queryquery <-"SELECT * FROM gis_osm_roads_free_1 WHERE fclass='motorway'"# reading in the highways data with st_read()highways <-st_read("/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/gis_osm_roads_free_1.gpkg", query = query)
Reading query `SELECT * FROM gis_osm_roads_free_1 WHERE fclass='motorway'' from data source `/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/gis_osm_roads_free_1.gpkg'
using driver `GPKG'
Simple feature collection with 6085 features and 10 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -96.50429 ymin: 29.00174 xmax: -94.39619 ymax: 30.50886
Geodetic CRS: WGS 84
## ---- Houses Data# defining the query for the housesquery_houses <-"SELECT * FROM gis_osm_buildings_a_free_1 WHERE (type IS NULL AND name IS NULL) OR type in ('residential', 'apartments', 'house', 'static_caravan', 'detached')"# reading in the highways data with st_read()houses <-st_read("/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/gis_osm_buildings_a_free_1.gpkg", query = query_houses)
Reading query `SELECT * FROM gis_osm_buildings_a_free_1 WHERE (type IS NULL AND name IS NULL) OR type in ('residential', 'apartments', 'house', 'static_caravan', 'detached')' from data source `/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/gis_osm_buildings_a_free_1.gpkg'
using driver `GPKG'
Simple feature collection with 475941 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -96.50055 ymin: 29.00344 xmax: -94.53285 ymax: 30.50393
Geodetic CRS: WGS 84
Lastly, we use data from the U.S. Census Bureau’s American Community Survey for census tracts in 2019 from an ArcGIS file geodatabase. The metadata for each layer is available at ACS metadata and this data was downloaded in advance to this investigation.
#reading in geometry data and selecting for the layer containing the geometrycensus_geom <-st_read("/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/ACS_2019_5YR_TRACT_48_TEXAS.gdb", layer ="ACS_2019_5YR_TRACT_48_TEXAS")
Reading layer `ACS_2019_5YR_TRACT_48_TEXAS' from data source
`/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/ACS_2019_5YR_TRACT_48_TEXAS.gdb'
using driver `OpenFileGDB'
Simple feature collection with 5265 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -106.6456 ymin: 25.83716 xmax: -93.50804 ymax: 36.5007
Geodetic CRS: NAD83
#reading in income dataincome_median <-st_read("/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/ACS_2019_5YR_TRACT_48_TEXAS.gdb", layer ="X19_INCOME")
Reading layer `X19_INCOME' from data source
`/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/ACS_2019_5YR_TRACT_48_TEXAS.gdb'
using driver `OpenFileGDB'
Warning: no simple feature geometries present: returning a data.frame or tbl_df
Step 2: Creating a blackout mask
The following code outlines how I created a blackout mask for the Houston area that I could use for identifying impacted homes.
# combing the data into single stars objects for each dayfeb16_tile <-st_mosaic(nl_feb16_t1, nl_feb16_t2)feb07_tile <-st_mosaic(nl_feb07_t1, nl_feb07_t2)# adding an indicator of the attributes in the datafeb_16_tile_names =setNames(feb16_tile, "light_16")feb_07_tile_names =setNames(feb07_tile, "light_07")# matrix alegbra to calculate the difference light difference between the two dates blackout_dif <- feb_07_tile_names - feb_16_tile_names# #filtering for the differences of a drop less that 200 nW cm-2sr-1 as NAblackout_mask <-cut(blackout_dif, c(200, Inf), labels ="outage")# vectorizing the blackout mask and fixing any invalid geometriesblackout_mask_v <-st_as_sf(blackout_mask) |>st_make_valid()# creating a polygon of Houston's coordinates hou_border <-st_polygon(list(rbind(c(-96.5,29), c(-96.5,30.5), c(-94.5, 30.5), c(-94.5,29), c(-96.5,29))))# converting to an sf object and identifying the coordinate reference systemhou_border_sf <-st_sfc(hou_border, crs ='EPSG:4326')# cropping the blackout mask with the Houston polygonhou_outage_mask_v <- blackout_mask_v[hou_border_sf, ,]# reprojectting the cropped object to a new crs and converting it as an sf objecthou_outage_mask_v_3083 <-st_transform(hou_outage_mask_v, crs ='EPSG:3083')outage_mask_clean <-st_as_sf(hou_outage_mask_v_3083)
Step 3: Excluding highway data
To exclude light from highways, we created a buffer of 200 meters and kept areas in our blackout mask that were greater than 200 meters away from a highway.
# selecting the highway geometry datahighways_geom <- highways$geom# transforming the highway geometries to the consistent crshighways_geom <-st_transform(highways_geom, crs ='EPSG:3083')# creating a buffer zone for highways geometry data of 200 metershighway_buffer <-st_buffer(x = highways_geom, dist =200)highway_buffer <-st_transform(highway_buffer, crs ='EPSG:3083')# combining the geometries into one and creating a mask that excludes the highway datahighway_buffer <-st_union(highway_buffer, by_feature =FALSE)mask_hou_highway <- outage_mask_clean[highway_buffer, , op = st_disjoint]
Step 4: Identifying Impacted Homes
To find the number of impacted homes, the code below outlines how I used the new blackout mask with the highway data to identify homes most likely impacted by the power outage.
# transforming houses data to be usablehouses <-st_transform(houses, crs ='EPSG:3083')houses_st <-st_as_sf(houses)# filtering the houses data with the blackout maskoutage_houses <- houses_st[mask_hou_highway, drop =FALSE]# identifying how many homes were affectedprint(paste0("There were ", nrow(outage_houses), " homes affected by the power outage on Feburary 16, 2021."))
[1] "There were 138652 homes affected by the power outage on Feburary 16, 2021."
Step 4: Investigating Socioeconomic factors
Now that we have information on the houses that were affected we can match this with the socioeconomic census tract information and determine which census tracts were impacted by the power outage.
# transforming the census data to be consistent with the crscensus_geom <-st_transform(census_geom, crs ='EPSG:3083')#selecting the necessary variables and renaming for clarityincome_med_select <- income_median |> dplyr::select("GEOID", "B19013e1") |>rename(GEOID_Data = GEOID, median_income = B19013e1)# changing the income object to a data_frameincome_med_select_df <-tibble(income_med_select)# joining census geometries and median income datacensus_data <-left_join(census_geom, income_med_select, by ="GEOID_Data")# transforming both objects to the correct crscensus_data <-st_transform(census_data, crs ='EPSG:4326')outage_houses <-st_transform(outage_houses, crs ='EPSG:4326')# filtering the census data using the outage houses and adding column indicating that these census tracts were part of a blackoutcensus_outage <- sf::st_filter(census_data, outage_houses) |>mutate(blackout ='yes')
Step 5: Comparing the incomes of impacted tracts and unimpacted tracts
It is time to visualize our findings. This code breaks down the data wrangling needed for the visualizations and the maps and plots created to compare which census tracts experienced a blackout vs the census tracts that did not experience a blackout.
## --- Wrangling our data for our visualizations -----------# transforming both objects to the crs 4326 to crop itcensus_data <-st_transform(census_data, crs ='EPSG:4326')hou_border_sf <-st_transform(hou_border_sf, crs ='EPSG:4326')# cropping the census data with the Houston border for filteringcensus_data_hou <- census_data[hou_border_sf, ,] # transforming census data back to the EPSG:3083 crscensus_data_hou <-st_transform(census_data_hou, crs ='EPSG:3083')# selecting necessary columns for houston census datacensus_data_hou <- census_data_hou |> dplyr::select("NAMELSAD", "Shape", "median_income", "GEOID_Data")# selecting necessary columns for outage data by census trackcensus_outage <- census_outage |> dplyr::select("blackout", "GEOID_Data")census_outage_map <- census_outage |> dplyr::select("blackout")# converting census outage data to a dataframe in order to joincensus_outage_df <-as.data.frame(census_outage)# joining census outage data and census data for all of Houstoncensus_map_data <-left_join(census_data_hou, census_outage_df, by ="GEOID_Data")census_map_data <- census_map_data |> dplyr::select('median_income', 'blackout')# converting census map data to a dataframe to plotcensus_plot_data <-data_frame(census_map_data)
Warning: `data_frame()` was deprecated in tibble 1.1.0.
Please use `tibble()` instead.
# adding an indicator for homes that didn't experience a blackoutcensus_plot_data <- census_plot_data |>mutate(blackout =replace(blackout, is.na(blackout), "no"))# creating a data frame for homes that experienced a blackout to plotcensus_plot_data_blackout <- census_plot_data |> dplyr::select("median_income", "blackout") |>filter(blackout =="yes")# creating a data frame for homes that didn't experienced a blackout to plotcensus_plot_data_no_blackout <- census_plot_data |> dplyr::select("median_income", "blackout") |>filter(blackout =="no")
## ------- Mapping our data ---------------# changing the view mode to be interactivetmap_mode("view")
tmap mode set to interactive viewing
# mapping median income by census track and identifying outages by dotstm_shape(census_map_data) +tm_polygons(col ="median_income",palette =c("#227c9d", "#17c3b2", "#ffcb77", "#ffe2b3", "#feb3b1", "#fe6d73"),textNA ="Missing Income Data", colorNA ="#e4ebea",title ="Median Income") +tm_shape(census_outage_map) +tm_dots(shape =1,title ='blackout') +tm_layout(main.title ="Houston Census Data by Income that Experienced A Power Outage",legend.outside =TRUE,main.title.size =1 )